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Methods of representing Korean and Chinese characters 
are presented, using a limited number of keystrokes on a 
standard keyboard. Various attempts have been made to find 
the most efficient way to represent these characters such as 
enumeration methods, 16-bit coding’ for Korean character 
syllables, and the meaning and the sound method for Chinese 
characters. Details of these are explained with a brief 
introduction to some general properties of Korean and 
Chinese characters currently used in Korea. 
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I. INTBODOCTION 



The development cf computer and information processing 
has come to the stage of being able to handle Korean and 
Chinese character input and output. There is no problem in 
information systems for the input and output of characters 
from a standard Roman character keyboard, but the problems 
related to non-Roman characters from I/O to software prob- 
lems of language handling remain almost unsolved. Until 
recently the computer could not handle Korean or Chinese 
characters efficiently. It was not user friendly and data 
processing in Korea was imperfect and very unwieldy. Among 
the problems, the biggest issue is how to enter 2,369 Korean 
and 1,800 common Chinese characters from the standard Roman 
character keyboard. 

During the last few years, there have been great efforts 
at universities, research institutes and manufacturers for 
the development of good I/O devices for Korean characters. 
In Korea, natural language processing, . especially Korean 
language processing, is one of the essential elements for 
the future of computer and information systems. 

rirst the properties of Korean and Chinese characters 
will be presented as an introduction for those unfamiliar 
with these characters. Then, the resolution power of CRT’s 
and dot matrix printers and their relation to the shape 
characteristics (readability, asthetic quality, etc.) of 
Korean and Chinese characters will be discussed. The 
methods which are developed for Korean and Chinese character 
I/O can be applied to other character sets, especially to 
many ncn-Roman alphabetic character sets, not to mention 
Chinese characters in China. 
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II. BACKGEOOND 



A. PEOPERTIES OF KOBEAN DOCOMENTS 

Commcn documents in Korea are usually written in a mixed 
form utilizing Korean and Chinese characters. Minor use is 
made cf Ecman script. The usage of each character set 
depends on the kind cf document. In order to perform word 
processing efficiently in Korea, the simultaneous editing of 
these characters is essential. Table I shows the use of 
characters found for various types of documents. This data 
is based on sampling performed expressly for this study. The 
following sources were in the sampling process to construct 
Table I: 

1. newspaper - Korean Daily Times, ”3A Era", 16 
September 1984 

2. journal - "National Security", June 1984 

3. technical papers (A) - "COBOL Programming", Eong-A 

publishing Co., 1978 

4. technical papers (B) - "Introduction to Law", Beob 

Mcon Sa publishing Co., 1978 

5. business papers - Korean Air Lines Co. 

Although the sample was taken from a single source for each 
kind of document, it is the authors’ view that the documents 
selected are representative of the entire population of each 
type. 

B. CHARACTERISTICS CF KOREAN SCRIPT 



The native Korean alphabet was introduced in 1446, after 
centuries of the use of a more cumbersome method (known as 
IDU) to transcribe Korean with Chinese characters. The set 



TABLE I 

Proportions of Written Characters 



1 

1 


News- I 
paper | 


j Journal j 


1 Technical 
1 (A) 1 


paper 

(B) 


1 

1 


Business 

paper 


Roman I 

script 1 


n 1 


1 I 

1 3% 1 


1 40 % 1 


0% 


1 

1 


10% 


Korean | 

script 1 


84% 


1 76% i 


1 1 

1 55% 1 


5 5% 


1 

1 


80% 


Chinese I 
character | 


1 5% 


1 21% i 


1 5% 1 


4 5% 


1 

1 


10% 



* (AJ : Technical capers from western countries 

* (3): Traditional and historical papers 



of 28 letters! (now 24 letters) was designed by a group of 
scholars commissioned by King Sejong (1419 - 1450), the 

fourth King of the Yi dynasty. 

The Korean language and alphabet is spoken and written 
by an estimated 50 million people on the Korean peninsula 
and its coastal islands. Many among the approximately one 
million Koreans residing in Japan, China, and America still 
speak and write the language [Ref- 9]. 

The Korean alphabet currently used consists of 14 conso- 
nants (_n_ ^ JL ^ 3- M. 3-) 

vowels (_t_ _D • There are also 17 

compound consonants (tj JA lx cc So oi 

^ ^ ZZ) ^wd 11 compound vowels (_H_ ^ 

fii iL ri ). The letters of the Korean alphabet cannot be used 
independently but are used to build syllables. Each Korean 
character consists of two or three parts. The first part 



!A letter is an element of a character. The character 
consists of two or three letters. Letters in Korea are a set 
of 14 consonants and 10 vowels. 



must te a consonant or compound consonant. There are 19 
letters that are possible for the first part of the Korean 
character. They are typically consonants or compound conso- 
nants (i_ 22 1=. ri dBAM o £IL 

o_) . The second part of the Korea character is typically a 
vowel or compound vowel. There are 21 possible letters for 
the second part of the Korean character Ji_ JL 2L jl 

iL ^i: id iL iL Jil Ji 2^ _l! JL .n ^ J_) • -iie third 

part of the Korean character is optional and depends on the 
character being depicted. The third part if present, must 
be a consonant or a compound consonant. There are 23 

letters possible as the third part (J_ 22 2^ 2z_ 2. 2i. 

U 2Q sd HA e£ io £Z JL dd A iL A A A AAA 
HI). This section has been summarized in Figure 2.1. 

The Korean system of writing is called ’’Hangul”. It is 
’’phonetic" writing, like English, in the sense that the 
symbols represent sounds, that is, consonants and vowels. 
Unlike English symbols, which are grouped directly into 
words (e.g., E+n+g+l+i+s+h = English), Korean symbols are 
first grouped by syllable (e.g., H+a+n g+u+1 = Han gul) 
[Ref. 10]. 

Korean symbols are written in syllabic groupings. An 

enumeration method^ is to put letters side by side as in 
"LONDCN". But the Korean language stacks the letters in most 
characters. For example, ’’LONDON” would be written . 

The simplest syllable is written with one consonant and one 
vowel. When one writes the symbol for a vowel alone, one 
must add the consonant symbol ”_0_”, which indicates an 
initial mute (which is closed as a consonant) . In this 
simple consonant and vowel syllable, there are two types of 
arrangements; side-by-side arrangement (e.g., 7_h_) and 



2jn an enumeration method letters are olaced side by 
side or element by element using a set of "consonants and 
vowels. 
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THE CHARACTERISTICS OF KOREAN CHARACTER 

THE K0RE.W ALPHABET CONSISTS OF 24 BASIC LETTERS! ELEMENTS ); 

14 consonants: TLCaDOAHXATcDS 
10 VOWELS : (■ C 4 4 i U T n _ I 
EACH CONSONANT AND VOWEL CAN BE COMPOUNDED 
. POSSIBLE COMPOUND CONSONAITTS 
n lA LA LS CC 2T SU 2U 2A 2c 2D 2e OT btl bA AA AA 
. POSSIBLE CCWPGUND VOWELS 
H Cl 41 41 it- IH 11 T4 HI Tl _l 

EACH CHARACTER CAN BE DIVIDED INTO THREE PARTS < FIRST 
SOUND, MIDDLE SOUND, FINAL SOUND) OR TWO PARTS (FIRST AND 
SECOND SOUND). 

. THE FIRST PART MUST CONSIST OF A CONSONANT OR A COMPOUND CONSOWiNT 
THE SECOND PART MUST CONSIST OF A SINGLE OR A COMPOUND VOWEL 
THE THIRD PART IS OPTIOJJAL. IF USED, IT MUST BE A CONSONANT- 
. THE FOLLOWING LETTERS CAN BE USED AS THE FIRST PART) 
inLCCEaObbaAAAHA-AAnTcDS 
19 LETTERS 

• THE FOLLOWING LEHERS CAN BE USED AS THE SECOND PART) 

C H C Cl 4 41 4 41 i it- IH il U T T4 T-H Tl n _ _l I 
21 LETTERS 

. THE FOLLOWING LETTERS CAN BE USED AS THE THIRD PART) 

T TT TA L LX LS C a 21 20 2d 2A 2c an as 0 01 a A AA H X A T 
c n S ; 28 LETTERS 

». NUMBER OF POSSIBLE COMBINATIONS OF CHARACTER = 19»23»29 = 11,571 
IN PRACTICE, ONLY ABOUT 2, 400 CHARACTERS ARE USED- 



Figure 2.1 The Korean Alphabet 



top- to-bottom arrangement (e.g., _rl_) • '^he particular vowel 
being written determines which arrangement is used. 

Representing these character syllables through a 
computer creates a problem because each letter’s (consonant 
and vowel) shape can be different due to a requirement that 
each character be balanced, i.e., have the same size and 
achieve a desired asthetic quality. For example, when _n_ is 
placed to the left cf a vowel, the downward portion* is 
slanted: 7 (e.g., zL ) . when it is placed on top of the 

vowel, the downward portion becomes straight; ~l (e.g., 
3. ) . As shown above, it is very difficult to apply these 
different shapes for a particular letter to a line printer 
and a typewriter. This problem will be discussed in detail 
in the following chapter. 

By mathematical calculation, the possible number of 

Korean characters is 11,571 ( 19 * 21 * 29 ). It must be 

noted through that only 2,369 characters are commonly used 
[Hef . 8: p. 11 ]. 

C. CHARACTERISTICS OF SINO-KOREAN CHARACTERS 

Sino-Korean characters are Chinese characters used in 
Korea. They are different from those used in China. Koreans 
refer to Chinese characters as Hanja. Chinese characters 
have a long history, the earliest discovered writings having 
been dated from about 14 B.C.. In 109 A.D. during the Han 
Dynasty, this was modified by Hsu Sheng (jS’J'l/ 30 - 124 ) in 
his 15 - Volume paleographical work., Shuo-wen Chieh-tzu, 

) which translates to the explanation of writing 
and analysis of words. That work lists 9,353 characters 
under 540 radical entries. Of this number, 36 4 are picto- 
graphic, 125 simple idiographic, 1,167 compound idiographic 
and 7,697 phonetic compounds. 
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The most complete collection, the Kang Hsi Dictionary 
with about 50,000 characters was published in 1716. Since 
1949, after the establishment of the Peoples Republic of 
China, the Chinese government actively pursued language 
reform until the Cultural Revolution, 1966-1976. The Chinese 
government changed and simplified the characters from the 
original [Ref. 5; p. 15]. 

The number of characters used commonly is from 1,000 to 
3,000. Table II [Ref. 1: p. 819] shows the frequency of 



TABLE II 

Frequency of Chinese Characters Used in Documents 





News- 

papers 

(%) 


General 

Document 

(%) 




Total 

Document 

{%) 


News- 

papers 

(chrs) 


General 

Document 

(chrs) 


1st 10 chrs 


10.0 


8. 8 




80 


499 


638 


50 


27.5 


25. 5 




85 


615 


777 


100 


38.9 


36. 1 




90 


781 


992 


200 


55.4 


51.0 




95 


1 0 68 


1358 


500 


79.0 


73.5 




96 


1156 


1479 


1000 


93. 1 


89.0 




97 


1269 


1617 


1500 


97.4 


95.0 




98 


1421 


1832 


2000 


98.7 


97. 6 




99 


1661 


2157 


2500 

3000 


98.9 1 


99.4 
99. 8 




1 00 


2879 


3323 



* chrs ; acronym of characters 



Chinese characters used in typical documents. 

In 1972, the Korean ministry of Education suggested that 
1,800 Chinese characters be learned and used for educational 
purposes [Ref. 3]. In this study, the authors will restrict 
themselves to that set of 1,800 characters. The Chinese 
characters are called Hantzu in Chinese, Hanja in Korean, 
and Kanji in Japanese. All mean "Han Characters" ^ ) . 



These characters are used exclusively in Chinese writings, 
and in combination with the Hangul (Korean) alphabet in 
Korea and with the Kana Syllabaries in Japan. The 
Sino-Korean (Hanja ) , in written form, is a combination of 
three major elements: pictograms and ideograms, and 
phonograms [Eef. 5: p. 22], 

In the next chapter the perspective of a picture for 
each character will be used because of both the complexity 
of Chinese chracters and the ease of representation in the 
computer. Each Chinese character has the meaning and sound, 
for example, J|_ means heaven and the sound is cheon. Also, 
there are many characters which have different meanings but 
the same sound, or the same meaning but different sounds. 
In order to solve this problem there are several methods. 
Appendix A [Ref. 5: p. 17] shows the evolution of Chinese 
characters. 
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III. PROBLEMS OF E DIT IHG K^EAN AND CHINESE SERIPTS 

A. COEPEHT EDITING TECHNOLOGY 

The current word processing practice in Korea is to type 
Korean characters by the enumeration method, that is, input 
letters (8 bit code: consonant and vowel in sequence) and 

output these letters as a character syllable using a Korean 
character conversion program for Korean script. Appendix 3 
shows the EBCDIC input codes currently used by FACOM, and 
Appendix C depicts MBS (Mahawk Data Sciences) input codes 
used by IBM. To type Chinese characters the following 
sequence is followed: 

1. Depressing a Chinese character function key. 

2. Typing the sound character of a Chinese character 
using the enumeration method. 

3. Displaying all homonym (from 1 to 60) characters 
[Eef. 4: p. 34] that have the same sound. 

4. Selecting one character by using an index number, and 
entering the character to a buffer or file. 

Machines dealing with Korean language data are currently 
available from the IBM and FACOM corporations in Korea; 
IBM’s Multistation 5550 (1984) and FACOM OS I V ( KEF) ( 19 82) 
are newly updated and well developed machines. These 
machines still have several disadvantages in handling Korean 
and Chinese characters; 

1. A large amount of time is spent in character conver- 
sion. 

2. It is difficult to directly delete and insert records 
in a file. 

3. The word processing editor cannot recognize the char- 
acters being edited before executing a character 
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conversion program since only the enumerated letters 
can te displayed. 

U. The method of entering characters is inconvenient and 
requires a tremendous amount of effort for Chinese 
characters. 

5. One cannot convert all Korean character syllables 
into Chinese characters because there is not a one to 
one mapping. 

6. Data communication is impossible since there are no 
standard codes for Korean and Chinese characters. 

Appendices D and E show the keyboard of IBM Multistation 
5550 [Eef. 8: p. 14] and FACCM OS IV (KEF) [Ref. 7; p. 48] 
respectively. 

B. DSIR EEQDIEEMEHTS 

Most potential users have recognized that the computer 
is essential in data processing and office automation. 
However, because of the above constraints, they are unsatis- 
factory for use with the Korean language. Some general user 
requirements of computer researchers and manufacturers are 
the following: 

1. Users want to use Korean language commands and 
programs but there are no Korean language oriented 
operating systems or programming languages such as 
COBOL, FORTRAN, Pascal, etc. 

2. Users want to edit three kinds of characters simulta- 
neously and in a user friendly manner. 

3. Users want to display and print out data without 
using a conversion program, as is done with the 
Korean alphabet because of time, memory space, and 
inconvenience . 

4. Users want to use interactive files and database 
processing. 
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In summation, they want to use computers that handle three 
kinds of script in the same manner in which present 
computers do with the Roman alphabet. 

C. PEOBIEHS OF REPRESENTATION OF THE THREE KINDS OF SCRIPTS 

Because of the charac teristics of Korean and Chinese 
characters, the following problems occur: 

1. Hew can one enter 2,400 Korean characters and 1,800 
Chinese characters into a computer through a limited 
number of keystrokes. 

2. How can one develop the system program to direct 
input and output without using a conversion program. 

3. How can the asthetic quality of display and output be 
improved . 

4. How can one increase the processing speed and reduce 
the memory space for these character definitions. 

There are other problems but' the above problems are the 
most significant. Among these problems the first one is the 
most serious and significant problem, and consegue ntly , the 
authors will give it more attention in this study. 
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IV. POSSIBLE METHODS FOR KOREAN LANGOAGE DATA PiQCESSING 

In order to solve the problems which were mentioned in 
the previous chapter, the following methods are offered as 
possible alternatives for Korean language data processing. 

A. 8-BIT CODE FOR KOREIN ILPHABET 

Since the Korean alphabet consists of only 24 letters 
and Korean language data can be expressed using only Korean 
characters without a serious problem. The enumeration 
method, like the Reman alphabet, is the easiest way to 
represent Korean characters without changing the hardware 
and the operating system. This method is not highly readable 
and would require changes in the language which may not be 
acceptable to users. 

1 • Usinq the Curre nt S tanda rd Keyboard 

A program can be loaded which defines the 24 letter 
Korean alphabet to a character generator instead of the 
lower case Roman alphabet. All Korean alphabet elements and 
the upper case Roman alphabet characters are then available 
through the standard Roman character keyboard. With this 
method the user can use a computer in a similar manner as 
the users who use the Roman alphabet. In addition, well 
developed hardware and software can be used without critical 
problems. This method has been suggested by many groups of 
people from the time when the Korean typewriter was first 
developed. The only disadvantage is the breaking of tradi- 
tional custom. To capitalize on developed technology and for 
the ease of application, more study and research should 
center on user acceptability of the enumeration method. 
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Figure 4.1 shows an example of hard copy which uses a 
graphic dot printer and a standard keyboard. Appendix F 
shows the load command program for an alternative character 



SEO 0-1 aia 

L> o_i At a c-iL ai sto o.l hia hi l_l Act aia ora i.a ura i_a 
DiA CH ma At ot aiL cio l-« ait oa ah Ata ctl dw- > a. aia 

0_U LI O. 



Figure 4. 1 Example Osing Standard Keyboard. 

generator for the Korean alphabet. This program can be 
generated easily by the alternative character set editor, 
and it loads the Korean alphabet to an alternative character 
generator instead of tlie lower case of Roman alphabet. 

2* UsiSSI the Capital Le_^ers as the Initial Letter 

The major difficulty with the enumeration method is 
poor readability. Korean users read a sentence sequentially 
syllable by syllable. In order to increase readability, the 
initial letter of each character can be written as an upper 
case letter to distinguish the syllable easily. Figure 4.2 
shows the example using the capital letters and Appendix 0 
represents the load command program for these letters. A 
special mark or altered shape of each letter also can be 
applied to increase a readability when an enumeration method 
is used. 



tJiO 

L^0J (0-iHu Ti^oO-i. Hi^iiL-l OraXA Ural-/, 
HiACxl/ta ;vO-Eil Qol-i X /u‘X)iXj LaCnt (L<> 

0-bLi Ch < 



Figure 4.2 Example Using Capital Letters. 



B. 16-BIT CODE FOE TBE THREE KINDS OF SCRIPT 

There are various methods one can use to enter Korean 
and Chinese characters, but the 16-bit code is one of the 
tetter methods, since it can identity all possible Korean 
and Chinese characters without using the erjiuer at ion method 
and a conversion program. The structure of this code will be 
discussed briefly in the following subsection. 

IS£ 5.££iE£ 



As mentioned before, a Korean character syllable 
consists of three parts; 

1. First sound; cne of 19 simple or double consonants. 

2. Second sound; cne of 21 simple or compound vowels. 

3. Third sound; cne of 28 simple or compound consonants 
(optional) . 

Since the number of each first, second, and third 
letters is less than 32 letters, 5 bits are enough to iJen- 
tify each sound. All possible Korean characters can be iden- 
tified using 15 bits. The 1st bit of 16 bits is used to 
indicate a Korean character (by a 0). The next 5 hits are 
used for the first sound, the following 5 bits for the 
second sound, and the final 5 bits for the third sound. 
Table III shows the structure of 16-bit code for the Korean 
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character and Table IV explains the 16-bit code for Korean 
character. This code table is basically the same as the IB*! 
2-byte internal Korean character code [Ref. 6: p. 52]. The 
only difference is the arrangement. Some IBM codes represent 
three letters. This makes key tops (face of each key) more 
complex; for example, 00100 (one key top) represents _L-, _H_, 
and nk values. The code suggested in Table IV reduces seme 
of this complexity by limiting the possible values to no 
more than two for each keytop. In contrast to the example 
for IBM codes, the same code from Table IV represents only 
one value. Appendix H represents the IBH 2-byte internal 



TABLE III 

Structure of 16-bit Code for Korean Script 
< 1st Byte >j< 2nd Byte > 



1 0 
1 


1 1 1 1 
1 1 1 1 


till 
1 i 1 1 


1 1 1 1 1 

1 1 1 i 1 




< — 1st sound — J 
5 bits 1 


[ 2ni sound — j 

1 5 bits 1 


C 3rd sound — > 

1 5 bits 



* 0: Korean character 



Hangul code for the Korean character. 

The suggested code has several advantages. First, it 
is easy to sort the character order by its value since the 
value of each letter is in the order of the Korean alphabet. 
Second, it can reduce the memory space for data by using 2 
bytes instead of 3 bytes for one character. Third, it is 
possible to edit the character directly since it does not 
need code conversion. Finally, since it can easily 
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recognize the code value of the Korean character, it helps a 
programmer when it is programmed. 



TABLE IV 

16-bit Code for Korean Script 



5 bit 
code 


1st 

sound 

letter 


2nd 

sound 

letter 


3rd 

sound 

letter 


5 bit 
code 


1 St 
sound 
letter 


2nd 
sound 
lett er 


3rd 

sound 

letter 


00000 








10000 






So 


00001 


"7 






10001 


□ 




□ 


00010 


in 


1- 


11 


10010 




IL 


OT 


0001 1 




H 


Ik 


10011 


t! 




U 


00100 


L 




L. 


10100 


bU 


T 




00101 






LX 


1010 1 




T-j 


HA 


00110 




H 


Lo 


10110 


A 




A 


0011 1 


c 




d 


1011 1 


AA 


7=*1 




01000 


zz 


-1 




1 1000 


o 




O 


0 1001 


£ 




£ 


1100 1 


A 




X 


01010 






ST 


11010 


XX 


r\ 1 




01011 




A 


ED 


11011 




7T 


A 


01100 




=11 


2a 


11100 




— 


=7 


0 1101 




J- 


Zk 


11101 


E 




E 


01 1 10 






2£ 


11110 


JC 


1 


I 


0 1111 






£JL 


11111 


a. 

o 




o 



* Blank: Not used 
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2 • 16-b it C od e for Ch ines e Characters 

There is no limitation in the number of usable 
Chinese characters, but statistics show that 1,800-3,000 
characters cover 98-99.8 percent of those which appear in 
newspapers and journals (Table II). Currently there are 
only two ways to represent Chinese characters in Korea. One 
method is comprised cf two steps. The first step is to 
display all Chinese characters (synonym) which have the same 
sound after entering the desired sound, and the second step 
is to enter the Chinese character which is needed by the 
user via an index number matched to that character after 
selecting it in the display. The other method is to convert 
a Korean character to a Chinese character after typing a 
Korean character as a unit of a word, which consists of two 
or three characters. 

The former is inconvenient and takes a long time to 
edit. The latter has no flexibility in that it is limited by 
the programmed word cedes. To solve the above problem and 
simplify the identification of each character using a 
limited number of keystrokes, a 16-bit code for Chinese 
characters can be applied. Table 7 represents the structure 
of 16-bit code for Chinese characters. 

Chinese characters represent both meaning and 
phonetics to Koreans. To simplify the code, all the 
complete meaning and sound of the Chinese characters are not 
needed. The Chinese characters are composed of from one to 
five syllables for meaning and one character for the sound. 
Simplicity can be achieved by employing abbreviations or 
acronyms for each part (meaning and sound) . For example, a 
Chinese character (J§_) has a meaning as "Hea-Ven" and sound 
as cheon. In this case we use H of Hea, V of Ven, and C of 
cheon as a code for (_^_) . 
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TABLE V 

Structure of 16-bit Code for Chinese Characters 



< 1st Byte >1< 2nd Byte 

1 

M I I I I I I I ! j ! I 



> 



of 

1st meaning 
character 



< — 1st letter-X — 1st letter — X — 1st letter--> 



of 

2nd meaning 
character 



of 

3rd meaning 
characte r 



* 1: Chinese character 



But this method may result in duplicate codes for 
different Chinese characters which mean another character 
and may have the same value as HVC. In order to eliminate 
the duplicate code and to use the 3 letter code which is 
compatible with the 16-bit Korean character code, the 
following characteristics of the sound and meaning of 
Chinese characters are relevant: First, only 428 syllables 
are used to represent the sounds for all Chinese characters. 
That is, one sound can represent 1 to 60 Chinese characters. 
Second, the frequency of Korean characters used for the 
meaning and sound is irregular in distribution. More 
specific, 20% of Korean characters are used to represent the 
sound and meaning of 95% of Chinese characters [Ref. 11]. 

As a result of analyzing the 1,300 sound characters 
and 1,438 meaning characters used to represent the Chinese 
characters. Table VI and Table VIII are derived. Table VI 
represents the number of Chinese characters which have the 
same first sound letter and the same second sound letter. 

For example, 266 Chinese characters have the first sound 
letter ( ~1 ) , 44 Chinese characters have ( 1 ) as a first 
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TABLE VI 












— 


Characters Having Same 


1st 


5 2nd Letter Sound 






2nd 


1st letter 




































“1 


L 


C 


a 


D 


a 


A 


0 


X 




E. 


JT 


o 




h" 


44 


8 


26 


17 


20 


27 


62 


25 


41 


22 


17 


1 1 


27 


h 


H 


12 


6 


8 


4 


13 


12 


4 


5 


11 


7 


8 


2 


9 


Group 


)r 








9 








17 










5 




H 






























H 


18 




1 






1 1 


42 


12 


59 


28 






7 


H 




1 












6 




14 


2 








Group 


^ 1 


34 


4 




26 


12 


12 




43 








7 


18 






15 














4 








6 


3 






34 


5 


36 


13 


18 


21 


28 


16 


28 


1 5 


9 


10 


24 


j- 




22 














7 


4 








24 


Group 




1 












2 
















H 


4 


2 




2 






1 


2 


1 


2 


1 




8 




JL 


8 






3 


5 






13 








4 


3 




T 


32 




6 


5 


16 


33 


44 


22 


22 


1 8 


5 


4 


6 


T 




6 














13 












Group 




























1 




7l 


3 














14 




6 






2 




7f 


5 






12 








23 










4 




— 


20 


1 


4 


1 






10 


12 


8 


4 


1 




3 


— 


















10 










6 


Group 


1 


16 


1 




13 


1 4 


19 


37 


29 


36 


1 8 




9 




Subtotal 


263 


27 


81 


105 


98 


135 


237 


265 


224 


122 


41 


53 


148 


Total 1,800 
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sound letter and ( ) as a 2nd sound letter. One must rear- 
range the sound acronym to the 5 bit code since the distri- 
bution of sound characters is irregular. Table VII depicts 
the rearranged code value for acronym of the Chinese sound 
character. 




TABLE VII 

5-bit Code for the Acronym of Sound Character 



code 


sound acronym 


code 


sound acronym 


00000 


y\- 


10000 


A 


00001 


7i 


10001 


oK 


00010 


jL 


100 10 




000 11 


•f 


100 1 1 


3- 


00100 


CL 


10100 


O 

-r 


00101 


L- 


10101 




C0110 


C. 


101 10 




00111 




10111 




01000 


s 

# 


1 1000 




01001 


a 


1100 1 


A 

'T 


01010 


ti* 


110 10 




01011 


y 

• 


11011 


~x 


01100 


a1“ 


11100 


E 


01101 




1110 1 


JL 


01 1 10 




11110 


o* 


01111 


T 


11111 


o' 

• 



In Table VII the second letter ( , b- , ~T , — ) 

describes all the group letters. For example, ( 1- ) 
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represents H , _y_, Jl_, and ( H ) ; _A_, =1 , 

_A\_, and ^_/ _iLr etc. Also (_^) 

represents )- ^ -I group which assembles the first conso- 
nants at the left of the vowel. (_i_) describes _T_/ 

and _3_ group which assembles the first consonants above the 
vowel . 

Since the frequencies of Korean character syllables 
representing Chinese character’s sound and meaning are 
different, the frequency of Korean characters to represent 
Chinese characters meaning is needed to be analyzed. After 
analyzing the sampled 1,438 characters which are the first 
and the second meaning characters. Table VIII is derived 
which shews the number of times for a meaning character or a 
group to be used. 

The meaning acronym value to a 5-bit code from the 
basis of Table VIII can be reassigned. Table I)C shows the 
reassigned 5-bit codes representing the acronym of the 
meaning character. The same theory can be applied as in 
Table VII when Table IX is derived. As the acronym code is 
rearranged, the proportion of the duplicate codes can be 
reduced. As a result of applying these rearranged codes. 
Table X can be produced which shows the proportion of the 
duplicate codes. The pure acronym code (Table IX) repre- 
sents the acronym of a meaning and a sound character as a 
first letter code of Korean characters (J7 J1 J=. Sz. SS 5_ 
d UU ^ ^ JL fL i possible conso- 

nants) , the arranged sound character acronym code (Table 
VII) , and the arranged sound and meaning character acronym 
code (Table VIII) . 

The reasons why some duplicate codes cannot be elim- 
inated are: First, some Chinese characters have similar 

meaning and sound which generates the same acronym code (22 
among 1,800); and second, there are initially some Chinese 
characters which have the same meaning sound ( 12 among 
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TABLE 7III 



Frequency of Meaning Character in Chinese Characters 



2nd letter 



1st 

It 


1- group 


^ group 


-L group 


T group 


— gr ou p 


TOT 




1- 


H 




->\ 


=il 


X 


J-h 1 


T 


T-^ 


r| 


1 — 


1 




“7 


i»2 


6 


25 


9 


22 


25 


6^ 


1 6 


27 


3 


9 


26 


30 


236 


ni 


1 


4 


2 






2 




1 


4 


1 




4 


4 


23 


U 


23 


8 


7 




3 


9 




1 


7 






13 


2 


73 


C 


23 


9 


12 






24 


2 


1 


17 




1 


1 1 


3 


103' 


cc 


1 


1 


3 


2 










2 




2 


1 


1 


13 


E. 


10 


6 


5 




7 


10 




2 


4 




1 


28 


31 


107 


□ 


39 


13 


10 


1 


3 


17 




2 


42 






1 


4 


S3 


y 


24 


5 


11 




13 


1 1 






15 








10 


89 


m 


2 


2 


1 




2 








1 






1 




9 


A 


32 


1 1 


21 


8 




15 




4 


17 




1 


1 1 


19 


107 


M 


5 




1 






15 


1 


2 








4 




10 


O 


28 


6 


29 


4 


19 


15 


1 


2 


22 


4 


2 


57 


55 


246 


A 


18 


5 


8 


2 




6 






12 






3 


39 


93 




2 










1 






1 








4 


8 


X 


5 


4 


6 






3 






4 






5 


13 


40 




2 


1 








1 










1 


9 




14 


£ 


8 


3 


4 






3 






2 






2 


1 


23 


TL 


4 


2 


1 




1 


1 






5 






1 


2 


17 




32 


2 


2 


2 




6 


7 










2 


10 


69 




301 


87 


149 


31 


70 


143 


16 


19 


132 


8 


17 


187 


228 


1438 



* 2nd letters having 10 are omitted for simplicity 
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TABLE IX 

5-bit Code for Acronym of Meaning Character 



code 


1st 8 2nd 
meaning acronym 


code 


1st 8 2nd 
meaning acronym 


00000 




10000 


A* 


00001 


71- 


1000 1 


» 


00010 


oM 


10010 




0001 1 


JL 


100 1 1 


H 


C0100 




10100 


1 . 


00101 


-X 


10101 


o 

T 


00110 




10110 




00111 


U 

• 


10111 


X* 


01000 


u* 


11000 


A 

• 


01001 


c 

• 


11001 




01010 


B- 


11010 


=7 


01011 


a 

• 


1 10 1 1 


E. 


01100 


Q* 


11100 


JL 


01 101 


0 

• 


11101 


o* 


oil 10 


b‘ 


11110 


a. 

O 

• 


01111 


tl 

# 


11111 





1,800 ) . For these characters which have duplicate codes 
the users must apply the exception rule. Alternatively, the 
meaning cf the character can be redefined to a synonym with 
a different acronym. For example, if the wrong homonymous 
Chinese character is displayed, the input operator may 
select another form of the homonym by keying in the full 
sound syllables instead of the acronym. 



TABLE X 

Proportion of Duplicate Code 






Number of 
character 


pure 

acrcrym 

code 


Rearran 
sound c 
r ac t er 


code 


Rearranged sound 
and meaning 
character code 


00 

o 

CO 


7.551 


2. 3^ 


CO 

■ 



3 • 16- bit C ode f or Eoraan Al ph abet and S ym bo Is 

In order to use the three nixed kinds of a character 
code, and simplify the I/O controller, and unify the word, 
16-bit codes for the Roman alphabet and symbols should be 
generated by only one keystroke. For data communication and 
for familiarity, adding only the default byte (OOH) to ASCII 
code, 16-bit code for Roman alphabet, symbols, and control 
characters can be defined. When one uses only Roman alphanu- 
meric characters, one can easily convert this 16-bit code to 
ASCII code. Table XI shows the 16-bit code for Roman 
alphabet and symbols. 

4 . Keyboard for 16 -b it Code 

As it is mentioned in the previous chapter, the 
biggest issue is how to enter all Chinese characters, Korean 
characters, and the Roman alphabets with a simple keyboard. 
In order to implement the 16-bit code to keyboard, one would 
have to make the keystrokes which generates ”1*' or ”0” as a 
Chinese character function key (bit 1) , three 5-bit codes 
(00000- 11111) for Chinese and Korean characters (bits 2-16) 
and 16-bit code for Roman alphabet and symbols. In this case 
33 more keys than the common Roman alphabet keyboard are 



32 



r 
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TABLE XI 

Structure of 16-bit Code for Roman Alphabet 



< 

1 



•-1st Byte-- 



•>|<- 



2nd Byte- 



•> 



-OOH- 



-> < ASCII Value > 

for Roman alphabet 



needed. To reduce the number of keys, one more function key 
can be added for the Roman alphabet which generates 18-bit 
code as a 5-bit code key. However, the user identification 
will be complex because there must be 4 or 5 letters on each 
key top. Table XII explains the 4 alternatives. Alternative 
I includes 32 Roman alphabet on 5-bit key tops using a Roman 
alphabet function key and Alternative II excludes Roman 
alphabet on 5-bit key top. Alternative A uses the acronym of 
sound and meaning characters and Alternative B uses only the 
acronym of sound character for the sound and meaning 
characters. 

To select the best one, the authors can use the 
following criteria: flexibility of hardware and software 

design, hardware efficiency, ease of maintenance, system 
reliabilty, user characteristics, number of keystrokes, 
number of duplicate cedes, and complexity of recognizing a 
certain keystroke [Ref. 2]. In the opinion of authors’ 
alternative II-B should be selected for the ease of explana- 
tion and understanding. By selecting Alternative II-E, a 
user can type the three characters simultaneously. For 
example, to type "School is (Hak-kyo) in Korean and 

(Hak-kyo) in Chinese character", one can type directly 
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TABLE XII 

Leveled Letters on 32 Key Tops 



Alternative I 



Alternative II 



I-A 

+ + 

1 



2 13 14 

i I 



5 16 

+ + 



I-B 



II-A 



II-B 



+ + 

1 



2 |3 p 



5 

+ + 




Legend: 1 

3 

4 

5 

6 



Roman alphabet 

First letter of Korean character 
Second letter of Korean character 
Third letter of Korean character 
Acronym of Chinese sound character 
Acronym of Chinese meaning character 



only Roman alphabets in the previous sentence w.ithout a 
function hey. Two syllables of "school" are formed from: In 
Korean, the first syllable is selected from the posi- 
tion of first sound letter, from the second sound posi- 

tion, "JL" from the third position. The second syllable is 
"_1, Default". Tien, in Chinese, press the Chinese char- 

acter function key which generates "1" as the first tit. The 
first syllable is which is the acronym of first meaning 

character and "JL” which is the acronym of second meaning 
character and which is the acronym of sound character. 

The second syllable is i*’* After typing Chinese 

characters, user must release the function key to type 
Korean characters. Table XIII explains the above example. 

As a result of the above example, the computer 
generates the following codes in hexadecimal: 0053 (S), 
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TABLE XIII 

Typing Procedures for Mixed Characters 

To type "school, Ji, 4^", the following 
procedures should be follwed: 

1. Type "school" by one keystroke for each character 
without a function key. 

2. In Korean, to type 
first, type 

second, type "Jl, Default". 

3. In Chinese, to type ah first, press 

the function key, then type ®l.", 

(Table IX) since for , 

the meaing is and the sound is and 

for the meaning is and the sound 

is "il". 



0063 (c), 0068 (h), 006F(o), 006F (o) , 006C (1) , and 0 (Korean 

character), IIHI(jL)/ 00010 (J-_), 0000 1 (J_); that is 0111 

1100 0100 0001 ( 7C 41:Jn_)» 0 (Korean character) , OOOOI(JL), 
10010 (ii), 00000 (Default) ; that is 0000 01 10 0 100 0000 ( 06 

40:J^_), and 1 (Chinese character), 01010 (^), 10100 (j^), 

11110 (11); that is 1010 1010 1001 1110 ( AA 9E:j^_), and 

1 (Chinese character), 11110 (£i), 00010 (JL) , 11110 ( i) ; that 

is 1 1 11 1000 010 1 11 10( F8 5E:^_). 

5 • Op era ting System for In£ut and Ou tp ut 

To apply the suggested system, it is needed to rede- 
sign the operating system for input and output control. 
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Figure 4.3 Flowchart of Input and Output Controller. 
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Figure 4.3 shows the flowchart of input and output control. 
First, the input and output controller has to distinguish 
whether the Chinese character function key is "0" or ”1". A 
flag register can be used for Chinese character function 
key. For example, if the flag is "1", then ”1" is loaded to 
the first of the 16 bit register, and multiple 5-bit codes 
are read until the register is full. If it is full, a char- 
acter is displayed on the CRT, and the 16-bit code is sent 
to a buffer as data. Otherwise the flag is "0", and then 
"0” is leaded into the register, and 5-bit codes are read 
for a Korean character. A 16-bit code is used for a Roman 
alphabet character or a symbol code until the 16-bit 
register is full. If the 16-bit register is full, the iden- 
tified character is displayed and sent to a buffer as data. 
If the ’’stop edit?” condition shown in Fig 4.3 is "no", the 
input and output controller makes a loop to read a code, 
displays a character and sends a character code to a buffer. 

This system will make the use of Korean language 
commands and programs easier to use than those presently 
available. To achieve the 'above goals, a compiler and inter- 
preter, as well as the operating system will require rede- 
signing. This system will require the complete rewriting of 
all software currently used. The economic impact of this on 
the Korean people will be enormous. 

6 • Design C ons iderations fo r Character Gen e ra tion 

There are twe shapes of characters used in Korea: 
Gothic (Figure 4.4) and Brush type (i.e. , Ming style: Figure 
4.5) [Eef. 7: p. 34]. To generate the above shapes of char- 
acters, several methods of a character generation can be 
considered. To select the best method for Korean and Chinese 
characters, one can use the following five criteria: speed, 
space, quality, flexibilty, and cost. Speed is a double 
standard: speed of creation may range from a few minutes to 
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! Ai B 1 


1 

t 






u * * 






* 



Figure 4.5 An Example of Brush Type. 

a few hours, while speed of production should go teycni 1000 
charact ers/sec depending on type size and device resolution. 
Space refers to the average size of the code for one char- 
acter as well as the size of the internal buffers often 
needed for decoding. Quality is proportional to the largest 
dot matrix which can be used to decode a character; it 
should not be confused with the resolution of the output 
device. For a given type size, the resolution sets the defi- 
nition, that is the size of the matrix to be used; defini- 
tion, hence type size, is bounded by the quality of the 
code. Flexibilty refers to the different automatic modifica- 
tions which are supported by the code; scaling, rotating, 
family variations (as going from light to bold). Cost is 
self explanatory [Ref. 12: p. 240]. 
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Obviously the five criteria above are not indepen- 
dent. Figure 4.6 shows the i nter rela ti onsh i p of criteria in 
designing a character generator [Ref. 12: p. 241]. The most 
desirable feature is indicated by the direction of the 
arrow. Solid (resp. dotted) lines indicate ajreemer.t (resp. 




Figure 4.6 Criteria in Designing a Character Generator. 

contrariety) between the variation of the factors. The 
design of a digital character generator is an engineer's 
task whose goal is to strike the appropriate balance between 
the specifications for those five criteria combined with the 
characteristics of the production device, resolution and 
scanning, and the necessity of operating the corresponding 
creation station. 

Table XIV [Ref. 12: p. 268] gives a summary of the 
main characteristics of the coding methods that the engineer 
can utilize [Ref. 12: p. 269 ]. As tlie characteristics of 
Korean and Chinese characters are compared to Table XIV, it 
should te apparent that the bit map methoi is the most 
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TABLE XIV 

Comparative Table for Performance 



Method 


Code space 
(bits) 


Buffer 

space 

(bits) 


Elex- 

bility 


Video 

scan 


Reso- 

lution 

(n) 


Qty 


Spd 


Bit 

map 


2 

n 


0 




+ + 


<5 0 




+ + + 


Run- 

length 


k*n*log n 
2 


n 




+ + 


<100 




+ + 


Chain- 

link 


6*k*n 


2 

n 






<100 
— 1 






Diffe- 
renti- 
al run 
length 


6*k*n + 
b [loq^ n+c) 


n 


— 


+ 


1 

<100 


— 


+ 


Spline 


k*log ^n 


k *log n 
m 2 












Struc- 

tural 


k’*log n 
2 


2 

n 














Legend: +; 




Good 

Bad 

The numer of birth point in the character 
A constant taking care of bookkeeping 
The size of matrix 

The average number of runs per matrix 
line cr column (a number of the simplicity 
of the character shapes: approximately 4 
for Reman body-text fonts, higher than 
10 for Chinese characters 






appropriate one for this application. It reduces code space 
and buffer space. It has good video scan, high speed, and 
highly readable low guality printing. Unfortunately this 
method lacks flexibility. However for all the other afore- 
mentioned reasons, the bit map method is commonly used for 
Korean and Chinese characters. 
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Because of the complexity of Korean and Chinese 
characters, at least a 16 by 16 resolution is required for 
Korean characters, and a 24 by 24 resolution for Chinese 
characters. 32 by 32, 64 by 64, 80 by 80, 96 by 96, and 128 
by 128 resolutions are desirable when much more beauty is 
required and also when larger character sizes are to be 
produced. However, if these characters are displayed on a 
CRT, with 32 by 32 resolution, with the size of each char- 
acter 7-10 mm square, this should be sufficient. 

It is the authors’ opinion that the less expensive 
32 by 32 resolution CRT should be used for softcopy. The 
reason fcr this is that the price of the memory component 
required to hold the character definitions is continually 
getting less expensive. However stronger motivation is that 
high-speed and flexibility of typing is then possible 
[Ref. 1: p. 828]. IBM corporation uses 16 by 16 resolution 
for Gothic and 24 by 24 for Brush type Korean character 
syllables [Ref. 8; p. 2]. FACOM corporation uses 30 by 30 
dots for Korean and Chinese characters and 24 by 30 dots for 
Roman alphabet, symbcls, and Korean alphabet (letters) on a 
laser printer [Ref. 7: p. 47]. As cheaper dot matrix and 
laser printers find their way into the marketplace, the 
quality of characters will become less of an issue. 
Presently there are few problems with the quality of repre- 
senting Korean characters that cannot be solved through the 
additional expenditure of money. For the definition of each 
character, the authors have presented two alternatives; 
software and hardware (character generator) . In order to 
increase speed and usability, a hardware-oriented character 
generator is best. If cost and flexibility are the 
criterion, software-oriented character definition programs 
are better. 
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Mem ory Space for Character D ef in it ion 

To represent Korean and Chinese characters, one 
needs to code 5,000 characters for a character definition: 
(2,400 Korean characters; 1,800 Chinese characters; 800 user 
definable characters, Roman alphabet, or symbols) . If one 
uses a 24 bit by 24 bit matrix font size for each character, 
at least 360 K bytes are required (3 byte * 24 * 5,000) for 
character definition and 128 K bytes are required (64 K for 
16 bit address * 2 byte for an address of character defini- 
tion memory) for a look-up table. The total memory require- 
ment is 488 K bytes. 

A large memory space is required for the definition 
of the characters. Data compression of these characters can 
be considered for two different purposes: data transmission, 
and computer storage and output. Here one is mainly inter- 
ested in the latter case, where the main point is the total 
data amount to be stored. The method of data compression of 
Chinese characters can be classified by using the method- 
ology listed in Table XV [Ref. 1: p. 820]. 

There is a problem associated with the enlargement 
and alignment of character patterns. The clarity of a char- 
acter depends on the size of the reproduction. If a large 
size is required the resolution must be high. Otherwise, 
stepwise zigzags appear which to some people are unbearable. 
Therefore, all the patterns of different font sizes must be 
stored. This is uneconomical. Reproducing different char- 
acter sizes from the same data is desired. However, the 
enlargement and shrinking of character patterns from a 
single set of data is quite difficult, because, if the addi- 
tion or the deletion of a bit by the interpolation is not 
done properly, it has a negative influence on the asthetics. 
In enlargement, the smoothness of an edge is particularly 
important, while in shrinking the gap between strokes must 
be carefully maintained. 
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TABLE XV 

Varieties of Data Compression Methods 



transmission 



memory 

& 

reconstruction 



enlarge/shrink 



page unit / run-length coding 

‘character unit S ^two-dimensional 

predictive coding 

coding by scan line 
pattern unit 

dot pattern coding by m by n block 

'^representation / pattern unit 

stroke checker board sampling 

representation 

contour coor- Ihexagonal board sampling 
dinate coding 

contour fol- 
lowing coding 

mathematical equation 
for strokes 

synthesis from partial character 
patterns 



A comparative review of the options contained in 
Table XV with regard to determining memory size is very 
difficult. This is because the requirements for character 
print qualities are quite different depending on each 
method. Simplicity in the hardware and software implementa- 
tion of the compression and reconstruction of characters is 
a very important consideration. Generally speaking high 
data compression methods need complex hardware and longer 
times for reconstruction. Therefore, the tradeoff to be 
considered is between the data compression ratio and the 
memory size. This represents the classic economic tradeoff 
between the hardware/software cost with regard to the speed 
of character regeneration. 
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Because the price of the memory component is 
becoming less expensive the high-speed simple reconstruction 
method is preferred despite the necessarily large size 
memory. Wany commercial machines have adopted this concept, 
and store the character dot patterns as they are without any 
data compression. For example, IBM machines use only a 12 by 
24 font size for simple letters (Roman and Korean alphabet 
and symbols) instead of a 24 by 24 font [Ref. 8: p. 17]. 
The FACOM machines use the software definition of the second 
level of Korean and Chinese characters which are not used 
frequently for data compression [Ref. 7; p. 12]. Because of 
the reduction in the price of memory, the marketplace has 
shifted towards providing direct character storage, i.e. a 
large memory, instead of utilizing data compression. 
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V. evald^ion of suggested methods 

The principal problems in current editing technology for 
Chinese and Korean characters were detailed in Chapter III. 
Fundamentally, the problems cause user inconvenience, 
require lengthy input procedures, and result in complex 
update requirements. The authors* suggested methods will 
solve most problems which are encountered in Korean language 
data processing. More research and development remain in the 
following areas: 

First, in an enumeration method, there is no problem 
except low readability to Koreans and the inability to 
represent Chinese characters. In this case, Chinese charac- 
ters are ignored because Korean language data can be repre- 
sented through the use of only Korean characters without 
serious problems. Low readability is caused by unfamiliarity 
and the unbalanced shape of each letter when written by an 
enumeration method. With a minor change of shape of the 
letters, this method will eliminate the above problems. 

Second, the 16-bit code for the three kinds of charac- 
ters requires the consideration of the following problems: 

1. The 32 key tops are complex since each key top repre- 
sents three or four letters and acronyms. One solu- 
tion to this problem is to use lighted, changeable 
key tops which represent only one letter or acronym 
at a moment according to the function keys and the 
order of keystrokes ( i. e. , 1st, 2nd, 3rd letter and 
acronym ) . 

2. The user must remember whether a letter to be typed 
is the first, second, or third letter, and whether it 
is an acronym of a sound or a meaning character. 
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3. If a user does not know the meaning of a certain 
Chinese character to be typed, one must look up a 
table which shows all meanings and sounds of all 
possible Chinese characters. 

4. In typing Korean characters which consist cf only 
first and second letters, a user has to hit the 
default key to make a 16-bit code. Instead of second 
letter and default keys, one can use twenty one more 
second letter keys which generate 10-bit code as the 
second and third letters. Unfortunately, this will 
make the keyboard more complex. 

5. Regardless of the authors’ analysis of sound and 
meaning characters and careful rearrangement of these 
codes, duplicate codes still exist. This is because 
of the irregularities caused by a natural evolution 
cf sound and meaning characters for over 2,000 years. 
Generally 3,000 or more Chinese characters will cause 
duplicate codes to increase proportionally. In order 
to eliminate the duplicate codes, the Korean language 
committee needs to take measures to clarify the mean- 
ings of the Chinese characters that cause duplicate 
codes to exist. 

Before the actual construction of the suggested system, 
an economic (Cost/Benefit) analysis needs to be considered. 
Given the r % discount rate and the various yearly costs and 
benefits estimated by past data. Table XVI [Ref. 14] shows 
the following formula which can be used to derive the net 
present value of this project: This simply states that the 

net present value (NEV) is equal to the sum of the differ- 
ences between benefits (B) and costs (C) in each year (i) of 
the project life (T) , divided by the relevant factor (r) for 
that year. The current estimate of the market size for word 
processing in Korea is $ 2.5 million annually (Korean Daily 
Times, Sep 10 1934). But this estimate will be in inverse 
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TABLE XVI 

Net Present Value Formula 



T 11 

NPV = Z 

i = 1 i 

(r) 



Legend : 



NPV: Net Present Value 
B: Benefit 
C: Cost 
i: Each year 
T: Project life 
r: P.elavent factor 






proportion to the price of the system and will be in direct 
proportion to the usefulness and the user-friendliness until 
maturation. 

In the above formula, the market price of the system 
influences the benefits for manufacturers and costs for 
users. This system is feasible when the net present values 
are positive for both manufacturers and users. If the 
benefits for manufacturers and the costs for users are 
constant in a system, the main problems will be: 

1. How to minimize the costs for manufacturers 

2. How to maximize the benefits for users 

To solve the above problems, the best approach will be to 
make an efficient and user friendly system for Korean 
language data processing. This will increase the number (N) 
of systems sold, and increase the individual productivity of 
the users. 

There are many factors and constraints which cause high 
cost in implementing this method. Among these, the following 
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three factors affect the cost performance ratio for both 
manufacturers and users: 

1. The initial design cost: For this system, an organi- 

zation has to invest initially for the design of 
about 5,000 Korean and Chinese character patterns, 
and the system software and hardware. As the number 
of systems produced by a manufacturer is increased, 
the unit cost of each system will be decreased as the 
costs are spread over more units. 

2. Cost for character generator: As mentioned in Chapter 
IV, one needs about 500 K bytes memory capacity for 
these character definitions. The cost of memory is 
decreasing and the speed is increasing as technology 
is being developed. This cost is an initial cost to 
users when buying a system. 

3. Cost for hardcopy: One can consider three kinds of 

printer for hardcopy: dot matrix printer, chain 

printer, and laser printer. It is not practical to 
use a chain printer for our system since the chain 
will be approximately twenty meters long ( 5,000 

character * 4 mm per each character) and it would be 
prohibitively slow. Currently, dot matrix printers 
and laser printers cost more than chain printers, but 
they are the only viable option. 

Among the three kinds of cost, the thiird one is the most 
serious since the cost of hardcopy is increasing as its use 
increases. Recently laser printers have become more popular 
for these characters because of the good quality, high speed 
and decreasing price. Comparatively though, laser printers 
are still relatively expensive. 
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VI. RECOMHEHDATION AND CONCLOSIOS 



As the demand for data processing in Korea increases, 
users will continue to encounter more and more problems in 
utilizing the Korean language for data processing. The 
current methods of convergence, display and select to imple- 
ment the Korean language in data processing must only be 
considered as interim measures due to their inefficient and 
time consuming means of data entry. In order to prevent this 
problem from becoming more complicated due to the develop- 
ment of various new implementations forwarded by independent 
research, a standardized system must be developed. 

This study examined two possible solutions for using the 
Korean language in data processing. The enumeration method 
is technologically feasible, inexpensive, easy to implement, 
but could not be used for applications within the Korean 
data processing environment. This is because it results in a 
textual form of Hangul that is unfamiliar to most Korean 
people. Therefore the current enumeration method is not a 
feasible solution to the Korean data processing problem. 

The second method examined was based on a 16-bit code 
representation of Korean, Chinese characters, and the Roman 
alphabet. This method was found to possess all the advan- 
tages currently realized by the EBCDIC or ASCII code repre- 
sentation of western countries. The only drawback to this 
system is that it might not be cost effective based on 
current technology. However, due to the rapid development of 
hardware and software technology, a cost effective means 
should be available within the next few years. 

In order to accelerate the determination of a thorough 
broad based solution to the Korean data processing problem, 
the Korean government must organize and charge a national 
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level comniittee with the responsibility for investigating 
the prcblem and determining a viable solution. This study 
with its proposal cf a 16 -bit character code should be 
provided to that committee for further examination. This 
proposal represents a concept that could eventually lead to 
a long term viable solution to the data entry and processing 
problems of using the Korean language. 
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APP^MX A 

THE EVOLDTIOH OF CHINESE CHARACTEES 
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&PPODTX B 

EBCDIC INPUT CODE 
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APP^DIX C 
MDS INPOT CODE 
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APPENDIX D 

IDH rtOLTISTATION 5550 KEYBOARD 
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appendix e 

FACCn OS IV (KEF) KEYBOARD 
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APPENDIX F 

LOAD COMMAND PEOGRAM FOR CORRENT KEYBOARD 



l(a"KC") 

1- "oo; 

l**fl“00081422417F41i*l 
1"D“Q07C22212121227C 
1"E*‘007F40407C40407F 
1"F"007F40407C404040 
’"G"00 1E21 40404721 IE 
1“H"004141417F414141 
1"I”003E0808080808 3E 
l''0“003E41 41414141 3E 
1"P"007E41417E404040 
1"Q*'0C3E41 41414542 3D 
1«R"007E41417E444241 
1“S"003E41403E0 *41 3E 
1"T"007F080808080808 
1"U"00414141414141 3E 
1 “W-0041 41 41 494955 22 
1"Y"0041221408080808 
l"a”00007F414l4141 7F 
l"d"00001C224l4122lC 
1"3''OOOC7F404040407F 
l"f«00C07F0101 TfuO IF 
l"^«0000087F00 3E41 3E 
l«h”000008080808087F 
1 “ 1 " 0 0 0 0 2 0 3 E 2 0 2 0 3 E 2 0 
1 •’©"0C304242427E4242 
1 "P"0000090909 790909 
l“a“00004lHl7F4l41 7F 
l'*r"OOOC7F0101010102 
I ••s"0C004040404040 7F 
1 •• t " 0 0 0 0 0 8 0 8 0 8 1 4 2 2 4 1 
1”li"0000011FC1011F01 
l"u;''00007F 0808 1 422 41 
l’*/"000C22222222227F 
1" ! "0008080908080008 
l"l"OG081829090e083E 
1"<" 000 106 1860 18 06 01 
l"ii"0C7E21213E21217£ 
l"C"00iE21404040211£ 
l"J"000E0404G404^43a 
1"K"00434C 5860 584C43 
1"L"OC4C404040404C7F 
1-M”00416355494141 41 
1 "N"004 161 5 1 4945434 1 
1 " V"004 141 2222 1 41 C08 



1 " 1 " 0 0 0 0 0 8 0 8 0 6 0 8 0 8 0 8 ; 
l"m"000000000000007F; 
l-n"00007F0808080808; 
1 "v“00007F222222227F; 
1 "k" 0000 7F007F40407F ; 
l"z"00007F011F010204: 
I ", "0000000000 302040 ; 
1". "0000000000 1818 ; 
1">"002018060106182C; 
1'"'000022147'=1422: 
1"'" 0000494977494977; 
l":"000003087F0808: 

1 "; ••0000097909097909 ; 
1 "C^'00007F 2222225549 ; 
1"3"0C00774444444477; 
1"("0000602010: 
1">"000E10107010100E ; 
l"«^^0C24247E247£2424; 
1"$"00033E483E093£08; 
l-?^*0051 o2040a 1 C 2 343 : 
1 "e**00060408 ; 

1"( ••0004020101010204; 
I " )"OOOCOOOOOOOCOC'=F ; 
l"i:*^001C20*.04040201C; 
i"*"00i8i80cooiei3; 

1 "-••003000 7F ; 

1 -/"000C222236494949 ; 
1"0"0C1C22454951221C; 
1"2"003C42010E30407F; 
1"3'*007F 0 2040E01413E; 
1-4"00040C14247F0404: 
1"5"OC7F407E410141 3E ; 
1"6'^001£21407E61211E; 
I "7"00 7F 01 0204 0 6 1C2C ; 
1“9"003E41413E41413E; 
l"9 '•0 0 3C 424 3 3D0 1 42 3C: 
l"="000C0C00000C331C; 
I"?^’ 003c41060e090008; 

1 " 3 " 00242424 ; 

1"\"0000771111111122; 
1 "^"003344^4364546 39; 
1"_"0000007F007F; 
1"*^^004141FF41495522 ; 
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APPENDIX G 

LOAD COMMAND PROGRAM FOR CAPITAL LETTER KEYBOARD 



l(a-lee") 

1" "oo; 

1"!"0000007F01010101020C; 
l"*"000000007F03142241«.i: 
1"$**000000083E^83E093£08; 
1”?»00000000007FOC22227F; 
1"'" 0000000000001813; 

1“," 00 000039090979090909; 

I". "000000050525277005 35; 

l"/« 00000009097909790 9 09; 

1"0" 0000000570152550^5-^5; 

l"l“OOFF01010101010101Ci; 

1"2"007F41^141414141417p; 

1"3”007F38061422414141h1; 

1"4"007F0C2424242424247F; 

l”5"0C494949497F4949‘t97F; 

l"6"0000004040407E40404C; 

1"7"OOOC000000080808037F; 

l"3"OOOOOOOOOOOC00307F: 

1"9" 00000001 0,1 0101030579; 

1"; "^0 0000001701121274141 ; 

1"2"OOOOOCOC7=414141417F; 

l"a"OC0000007F4040404C7F; 

1"3"0000007515157745-377; 

1"C”000000007F307F40407F; 

1"0"30000C007F013001020C; 

l”r"000C001C007F03l4224l; 

1*'F” 30000077090909112244; 

1"G" 00000077151575454577; 

l"H" 00000000771111714172; 

1"Q"0000000040404040407F; 

l"?»OO0OCOlCJO7F0O3ev.l3£: 

1"5" 00000000040814224141 ; 

1”T"000000001212324C4949; 

l"V"00 000000721212254949; 

1"WOC00000041^17F41417p; 

1"X"OOOOOOOC1C224141221C; 

1"Y" 00 000000771015754077: 

1”Z”000000007FJ1017F407F; 

1"-'«OC0000064G‘»F*»0464976; 

l"a"007F404040‘»04040407F; 

1"c"007F00007F-«04040407F: 

l"cJ”007F3101013F0101C10l: 

1"?”001C007F331422414141; 



l"k"000000004444447C4444 

1"1"OOOOOC01017F09091161 

l-m”000000007F 1414 141414 

l"n"00000002023E023E0202 

l"o"000000011111117F0101 

1"3"000000040424277C0404 

l"q"00404040404C4040407F 

l"r"OClC007F001C2241221C 

l"s"0C01 0204081422414141 

l"t”0C7F242424344A494949 

l"u"000000000C141414147F 

1 **v**00 77 4444444444444477 

1«U(”00414141417F4141417F 

l"x"001C22414l4l414122lC 

l"y" 00000 040407C4C7C404C 

l“:r"007FaiC‘lC17F4040407F 

l"t"0000001030501010107C 

1"("0000007F0204Dcoi413£ 

1")”000000040C14247F0404 

l-t-« 0000003C42310£30407E 

1-I''0000001C22405E51211E 

l"U"OC003C7r405e6101413£ 

1 '••'0C0C22147F1422: 

1"»"0016180C00131£; 

1 «-«000000 7F ; 

l":"0000030&7r0308: 

l"<"000106186C13060i; 

1"=”OOOCOCOOOOOC0810; 

1->"OC2O10O6O1O6182O; 

1"?"003E41 J60608000S: 

1"J"003E41413E4141 JE: 

1 "K "00 3C42-. 3 3C 0 1 -.2 3C ; 
1"L" 003 8 444436454639: 
1»*'”0C1C224549S1221C; 
1"N" 000803030808 0006; 

1 "3" 00 7F 0 1 02 04 0 3 1 C 20 ; 
1"P" 00616204 08102343: 
1"C"OC102C'^040‘.02010; 
l"\" 00060406: 
1"3"000402010101C204; 
1"_"0000307F007F; 

1-'" 004141 =F4]405522: 
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